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This paper provides ANOVA inference for nonparametric local 
polynomial regression (LPR) in analogy with ANOVA tools for the 
classical linear regression model. A surprisingly simple and exact local 
ANOVA decomposition is established, and a local R-squared quan- 
tity is defined to measure the proportion of local variation explained 
by fitting LPR. A global ANOVA decomposition is obtained by inte- 
grating local counterparts, and a global R-squared and a symmetric 
projection matrix are defined. We show that the proposed projection 
matrix is asymptotically idempotent and asymptotically orthogonal 
to its complement, naturally leading to an F-test for testing for no 
effect. A by-product result is that the asymptotic bias of the "pro- 
jected" response based on local linear regression is of quartic order 
of the bandwidth. Numerical results illustrate the behaviors of the 
proposed R-squared and F-test. The ANOVA methodology is also 
extended to varying coefficient models. 

1. Introduction. Nonparametric regression methods such as local poly- 
nomial regression (LPR) (Fan and Gijbels [9], Wand and Jones [26]), smooth- 
ing splines (Eubank [8]) and penalized splines (Ruppert, Wand and Carroll 
[23]) are widely used to explore unknown trends in data analysis. Given the 
popularity of these methods, a set of analysis of variance (ANOVA) inference 
tools, analogous to those of linear models, will be very useful in providing 
interpretability for nonparametric curves. In this paper, we aim to develop 
ANOVA inference for LPR. Some of the work in this paper was motivated 
by the authors' consulting project experiences, where clients presented with 
a nonparametric smooth curve would frequently ask if there would be an 
ANOVA table explicitly summarizing the fitted curve by sums of squares. 
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degrees of freedom, and an F-test for no effect. In addition, we are interested 
in exploring a geometric representation and establishing a projection view 
for LPR. 

Consider a simple bivariate case: data {Xi,Yi), i = l,...,n, are drawn 
from the model 

(1.1) Y = m{X) + a{X)e, 

where X and e are independent, and e has a mean and unit variance. Typi- 
cally one is interested in estimating the conditional mean, m(x) = E(Y\X = 
x), while the conditional variance is a'^{x) = Var(y|X = x). The theoretical 
ANOVA decomposition for (1.1) is 

(1.2) Var(y) = J {m{x) - fiyff{x) dx + j a'^{x)f{x) dx, 

where /(x) is the underlying density function for Xi, . . . , Xn, and fiy denotes 
the unconditional expected value of Y. Below we review briefly some related 
work on ANOVA inference for nonpar ametric regression. 

In LPR literature, we are not aware of a sample ANOVA decomposition 
for (1.2). A commonly used residual sum of squares (RSS) is ^27=10^1 ~ 
m(Vj))^, where m{Xi) denotes a nonparametric estimate for m{Xi), i = 
1, . . . ,n, but RSS is not associated with a valid ANOVA decomposition, in 
the sense that generally ELil^* " / Etii^Xi) - Yf + Etii^i " 
m(Xj))^, where Y is the sample mean of l^'s. Ramil-Novo and Gonzalez- 
Manteiga [22] established an ANOVA decomposition for smoothing splines 
with a bias term. An ANOVA-related quantity is the R-squared, or the coef- 
ficient of determination. Theoretically, it measures r/^ = 1 — E(yaic{Y\X))/ 
Var(y) = Var(£'(y|X))/ Var(y). Doksum and Samarov [5] suggested an es- 
timate 

r.2 ^ [n-'J:^{MX^)-fh)iY.-Y)]' 

where m = m(Xj). However, the correlation-based B?p does not pos- 

sess an ANOVA structure. For a local version of the R-squared measure, 
see Bjerve and Doksum [3], Doksum et al. [4] and Doksum and Froda 
[6]. An attempt to provide an analogous projection matrix is the so-called 
"smoother matrix" S, n x n, so that Sy = rh with y = {Yi, . . . , Y^)^ and 
m = {m{Xi), . . . ,rh{Xn))'^ ■ See, for example, Hastie and Tibshirani [13]. 
However, S lacks for properties of a projection matrix; it is non-idempotent 
and nonsymmetric in the case of local linear regression. Another essential 
ANOVA element is the degree of freedom (DF). Hastie and Tibshirani [13] 
discussed three versions: tr(S), tr(S^S) and tr(2S - S^S), where "tr" de- 
notes the trace of a matrix. Zhang [27] gave asymptotic expressions on DF 
for LPR. On testing for no effect, Azzalini, Bowman and Hardle [1], Hastie 
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and Tibshirani [13] and Azzalini and Bowman [2] introduced tests with the 
-F-type form of test statistics based on RSS. Fan, Zhang and Zhang [10] 
estabhshed the generahzed hkehhood ratio test with an F-type test statis- 
tic and an asymptotic chi-square distribution. Other F-flavor tests include 
Gijbels and Rousson [12]. 

From the discussion above, we beheve that there is a need to further 
investigate an ANOVA framework for LPR. Our focus on LPR arises nat- 
urally since it is a "local" least squares technique. A surprisingly simple 
local ANOVA decomposition is established in Section 2, leading naturally 
to defining a local R-squared. Then by integrating local counterparts, a 
global ANOVA decomposition is established, from which a global R-squared 
and a symmetric matrix H*, like a projection matrix, are defined. We note 
that the proposed global SSE (sum of squares due to error) is the same 
as the "smooth backfitting" error given in Mammen, Linton and Nielsen 
[19] and Nielsen and Sperlich [20] for estimation under generalized addi- 
tive models (Hastie and Tibshirani [13]). We show that when conditioned 
on {Xi, . . . ,Xn}, H* is asymptotically idempotent and H* and its comple- 
ment (I — H*) are asymptotically orthogonal, leading naturally to an F-test 
for testing no effect. A by-product is that the conditional bias of the "pro- 
jected" response H*y based on local linear regression is of order /i^, with h 
the bandwidth. To show that the ANOVA framework can be extended to the 
multivariate case, expressions of local and global ANOVA decomposition are 
derived for varying coefficient models (VCM) (Hastie and Tibshirani [14]) 
in Section 3. Section 4 contains numerical results on the performance of the 
proposed global R-squared and the F-test for no effect. In summary, our 
results are under one framework containing all essential ANOVA elements: 
(i) a local exact ANOVA decomposition, (ii) a local R-squared, (iii) a global 
ANOVA decomposition, (iv) a global R-squared, (v) an asymptotic projec- 
tion matrix H* , (vi) nonparametric degree of freedom defined by tr(ff*) 
and (vii) an F-test for testing no effect. The results also give new insights of 
LPR being a "calculus" extension of classical polynomial models and pro- 
vide a new geometric view on LPR highlighted by H* . Extension of the 
ANOVA inference to partially linear models, generalized additive models 
and semiparametric models is in progress. 

2. ANOVA for local polynomial regression. We begin by introducing 
LPR (Fan and Gijbels [9], Wand and Jones [26]) under (1.1). Assume that 
locally for data XiS in a neighborhood of x, m{Xi) can be approximated 
by m(x) + m'{x){Xi — x) + ■ ■ ■ + m^^\x){Xi — x^/pl, based on a Taylor 
expansion. Then this local trend is fitted by weighted least squares as the 
following: 

71/ p \ 2 

(2.1) minn-ij] y^-^/3j(Xi-x)n Kh{Xi-x), 

^ i=l \ j=o ) 
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where (3 = {Pq, . . . ,f3p)'^, Kh{-) = K{-/h)/h, and the dependence of /3 on x 
and h is suppressed. The function K[-) is a nonnegative weight function, 
typically a symmetric probability density function, and h is the smooth- 
ing parameter, determining the neighborhood size for local fitting. Let (3 = 
(/3o, . . . /Pp)^ denote the solution to (2.1). It is clear that /3o estimates m{x) 
of interest and j\j3j estimates the jth derivative m^^\x), j = l,...,p. For 
convenience of developing ANOVA inference in this paper, we define a local 
SSE as the resulting (2.1) divided by the sum of local weights: 

(2.2) SSEp{x-h) = fl!ri^ * ^j-o/-jv M « ; 



n-^Y.':=iKh{Xi-x) 

The denominator of (2.2) is the kernel density estimator f{x;h) (Silverman 
[24]) for f{x). Similar treatment can be found in Qiu [21], so that SSEp(x; h) 
estimates a'^{x). We note that (2.2) is equivalent to the SSE for weighted 
least squares regression given in Draper and Smith [7]. 

Recall that in the linear regression setting, the sample ANOVA decompo- 
sition is given as SST = Y.i{Yi - Yf = n"^ Y.i{Yi - Yf + n'^ Y.i{Yi - 
= SSR + SSE, where l^'s denote fitted values for l^'s from a linear 
model, SST the corrected sum of squares for l^'s, and SSR the sum of 
squares due to regression. In the literature of weighted least squares regres- 
sion (e.g.. Draper and Smith [7]) with weight Wi assigned to {Xi,Yi), the 
sample ANOVA decomposition is 

(2.3) ^(Fi - Y^fwi = Y.{Y^,^ - Y^)^Wi + ^(Y^ - 



l.W , 



Wi 



where Y^ = J2iYiWi/J2i''^i ^i^d Yi^w is the resulting fitted value for Y^. 

2.1. Local ANOVA decomposition and a pointwise R- squared. The local 
least squares feature of LPR leads us to consider whether an analogous local 
(pointwise) ANOVA decomposition exists. We note that it is not suitable to 
adopt (2.3) directly. By forcing a local fit of Y , we obtain a finite-sample 
and exact local ANOVA decomposition in Theorem 1 for LPR. In addition 
to SSEp{x;h) in (2.2), local SST and local SSR are defined as follows: 

(2.4) '^^'-"^ 



f(x;h) 

Note that both SSEp{x; h) and SSRp{x; h) use all the fitted parameters f5fs, 
in contrast to RSS using only /Jq. 
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Theorem 1. An exact and finite- sample ANOVA decomposition is ob- 
tained for local polynomial fitting at a grid point x in the range of Xi s: 

(2.5) SST{x; h) = SSEp{x; h) + SSRp{x] h). 

In addition, SSEi{x;h) for local linear regression (p=l) is related to the 
weighted least squared error of the Nadaraya-Watson estimator (p = 0), as 
given below: 

SSEi{x- h) = ~ rhNw{x)fKh{Xi - x) 



f{x;h) 
(2.6) 

^2 n-^J:{ X^ - XkfKh{X, - x) 
~ Pi 



f{x-h) 

where niNwix) = n~^Y.iKhiXi- x)Yi/ f{x;h) and Xk = n~'^Y.i^^Kh{Xi- 
x)/f{x]h). 

The proof of Theorem 1 is mamly algebraic and hence is omitted; (2.6) 
is simply (2.3). The "exact" expression (2.5) is very attractive and has an 
appealing interpretation of comparing the local fit with the simple no-effect 
Y in the same local scale. It is easy to see that SSRp{x; h) estimates (m(x) — 
liyY and SSEp{x;h) estimates a'^{x). 

Based on (2.5), we define a local (pointwise) R-squared at x as follows: 



(2.7) Rlix;h) = l 



,2/ , _ SSEp{x;h) _ SSRp{x;h) 
,p[x; ti) - - . 



From Theorem 1, Rp{x;h) is always between and 1, and R^{x;h) for local 
linear regression is always greater than R^{x;h) for the Nadaraya- Watson 
estimator with the same bandwidth and kernel function. A plot of Rp{x; h) 
versus x will give an idea of the quality of estimation at different regions of 
data. Rp{x; h) is a measure for the proportion of local variation explained by 
the local polynomial fit. We note that Rp{x;h) is invariant with respect to 
linear transformations of 1^'s, and will be invariant for linear transformations 
of Xj's {aXi + b) if the bandwidth is taken proportional to the transforma- 
tion, ah, accordingly. The classical R-squared for polynomial models can 
be viewed as a special case of (2.7), when using the uniform kernel at only 
one grid point X. Thus LPR, fitting local polynomials across data, is like a 
calculus extension of classical polynomial models. 



2.2. Global ANOVA decomposition and coefficient of determination. We 
now turn to developing a global ANOVA decomposition. It is convenient to 
introduce some conditions here. 
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Conditions (A). 

(Al) The design density f{x) is bounded away from and cxd, and /(x) has 
a continuous second derivative on a compact support. 

(A2) The kernel K(-) is a Lipschitz continuous, bounded and symmetric 
probabihty density function, having a support on a compact interval, 
say [-1,1]- 

(A3) The error e is from a symmetric distribution with mean 0, variance 1, 

and a finite fourth moment. 
(A4) The {p + l)st derivative of ■m{-) exists. 
(A5) The conditional variance cr^(-) is bounded and continuous. 

Based on (2.5), a global AN OVA decomposition can be established by 
integrating local counterparts in (2.5): 

SST{h)= J SST{x;h)f{x;h) dx, 
(2.8) SSEp{h)= f SSEp{x;h)f{x;h)dx, 



SSRp{h) = J SSRp{x;h)f{x;h)dx. 

Then a global ANOVA decomposition is 

(2.9) SST = SSEp{h) + SSRpih), 

which corresponds to the theoretical version (1.2). Since / Kh{Xi — x)dx = 

1 under Conditions (Al) and (A2), j' SST{x;h)f{x;h)dx = n-^YA=i{yi - 
= SST in (2.9). We then define a global R-squared as 

O 1 m p2/. N _ -, SSEpjh) SSRpjh) 

(2.10) R^ih) - 1 - - 

and we name it the "ANOVA" R-squared. We further investigate some 
asymptotic properties of R^{h). For simplicity, we focus on the case of an 
odd degree, for example, p = 1, in Theorem 2. A by-product is that SSE{h) 
is a -y/n-consistent estimate for when assuming homoscedasticity. 

Theorem 2. Assume that as oo, h = h{n) 0. When fitting LPR 



with an odd p, under Conditions (A) with n/i^^"*"^ — > and nh 
(a) The asymptotic conditional bias of Rp{h) is 

-h''^ j a\x)f"{x)dx{l + op{l)). 



oo: 
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(b) The asymptotic conditional variance of Rp{h) is 

„-.(^^(..,x))(/,.„.,..)*) . ""--'"4"^'^>'0 

X (l + op(l)), 

where ay is the variance of Y, = E{{Y — Hy)^} is the fourth central 
moment of Y , and Kq{v) = J K[u)K[v — u)du denotes the convolution of 
K and itself. 

(c) Under the homoscedastic assumption and conditioned on {Xi, . . . , 
Rp{h) converges in distribution to a normal distribution with the above 
asymptotic conditional bias and variance. 

(d) Under the assumptions in (c), SSEp{h) is a ^/n- consistent estimate 
for . Its asymptotic conditional bias is op{n 

'^''^) if J f"{x)dx = and its 
asymptotic conditional variance n~^a'^{J Kq{v) dv){l + op(l)). 

Theorem 2 is a special case of Theorem 6 in Section 3, and hence the proof 
of Theorem 6 (in the Appendix) is apphcable to Theorem 2. The condition 
on the bandwidth in Theorem 2 becomes h = o(n~^/^) and = o(h) for 

the case of p = 1. It is known that the optimal bandwidth for estimating m(-) 
with p = 1 is of order n~^/^ (e.g., Fan and Gijbels [9]). It is not surprising 
that we need a smaller bandwidth than the rate of n~^/^ to obtain a -y/n- 
consistent estimate for a^. 

2.3. Asymptotic projection matrix. Under Conditions (Al) and (A2), 
SSEp{h) and SSRp{h) can be rewritten as 

SSEpih) = ^"'|E^*' - /E (j2^jix)iXi - xy^ Kh{X, - x) dxj, 

SSRp{h) = n~^\^jY.{^Pj{x){X,-xy^ Kh{Xi-x)dx-nY^^, 
and in a matrix expression, 

(2.11) SSEp{h) = n~^y^{I-H*)y, SSRp{h) = n~^y^ {H* - L)y , 

where L is an n x n matrix with entries 1/n. In this subsection, we fur- 
ther explore if H* behaves like a projection matrix. The H* matrix can be 
written as H* = J WHf(x; h) dx, where is a diagonal matrix with entries 
KhiXi - x)/f{x; h), H = X{X^WX)-^X.^W is the local projection matrix 
for (2.1) with X the design matrix for (2.1), and the integration is per- 
formed element by element in the resulting matrix product. H* depends on 
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the data points Xj's, kernel function K and the bandwidth h. Under Condi- 
tions (Al) and (A2), H*\ = 1, where 1 denotes an n-vector of I's. Therefore 
the projected response H*y = y* is a vector with each element Y* being a 
weighted average of l^'s. The matrix H* is clearly a symmetric matrix, but 
it is not idempotent. Given this fact, we take a step back to explore if H* 
is asymptotically idempotent when conditioned on {Xi, . . . 

The authors are not aware of standard criteria of asymptotic idempotency. 
Below we define a criterion for asymptotic idempotency and asymptotic 
orthogonality in a nonparametric regression setting: 

Definition. 1. Conditioned on an n x n matrix An is 

asymptotically idempotent, if for any random n-vector response y with fi- 
nite expected value, E{{An — A^)y\Xi, . . . ,Xn\ tends to a zero vector in 
probability as n — > oo, that is, each element of E{{An — A^)y\Xi^ . . . ,X„} 
is asymptotically zero in probability as n ^ oo. 

2. Conditioned on {Xi, . . . ,Xn}, for two nx n matrices An and Bn, they 
are asymptotically orthogonal, if for any random n-vector response y with 
finite expected value, E{AnBny\Xi, . . . tends to a zero vector in prob- 
ability as n — > oo, that is, each element of E{AnBny\Xi, . . . is asymp- 
totically zero in probability as n ^ oo. 

Denote the multiplier for hP~^^ (p odd) or /i^"'"^ (p even) in the first-order 
term for the conditional bias of f3Q{x;h,p) as hQ^p{x) (see Wand and Jones 
[26], page 125). The following theorem gives the rate of each element in 
{H*-H*')y. 



Theorem 3. Under Conditions (A), suppose local polynomial regression 
of order p is fitted to data. The bandwidth h ^0 and nh ^ oo, as oo. 

(a) Forp 7^ 1, the asymptotic conditional bias ofY*, E{Y* — m{Xi)\Xi, . . . 
Xn}, for i = 1, . . . , n, is at most 

, . rO(/iP+i)(l + op(l)), p IS odd; 

^ ' \0{hP+^){l + op{l)), pis even. 

(b) For p = 1, the asymptotic conditional bias of Y^ , i = 1, . . . ,n, is of 
order h^ ; more explicitly 

( Z. io ) 

+ Op{h'). 



2 

(c) Each element in E{{H* — H* )y\Xi, . . . , Xn} is at most of order 
0{hP+'^) X (l + op(l)) for an oddp withp>3, at most 0{hP+'^){l + op{l)) if 
pis even, and 0{h'^){l + op{l)) whenp=l. Thus, conditioned on {Xi, Xn}, 
H* is asymptotically idempotent and asymptotically a projection matrix. 
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(d) For local linear regression, the asymptotic conditional variance of Y* 
retains the order of n~^h~^ : 

(2.14) Var{y,*|Xi, . . . = n-^h-\l + op(l))KoaV/(^i), 
whereKQ = fKX\v)dv-^fK^{v)Kl{v)dv + -^fKf{v)dv with K^{-) the 
convolution of uK{u) and itself. 

Theorem 3(b) implies that using the matrix H* , one can achieve a sur- 
prising bias reduction effect for local hnear regression, from the order of /i^ 
to /i^. While achieving bias reduction, the asymptotic conditional variance of 
Y* increases in the case of local linear regression. We calculate the constant 
term in (2.14) for the Epanechnikov and Gaussian kernel functions, and the 
ratios of the constant factors of Y^ and local linear estimator /3o(Xj) are 
1.38 and 1.10, respectively. It is of interest to know the form of y*, 

^ j (/3o(rE) + • • • + /3p(x)(Xi - xy)Kh{Xi -x)dx\ 

(2.15) H*y = 



yj iPoix) + ■■■ + Pp{x){Xn - xY)Kh{Xn - x) dx j 

The projection H*y uses all the fitted /3j(x)'s through integration and the 
gain is reduction in the asymptotic bias. It is in contrast with f3Q{Xi), which 
fits local polynomial at Xi and throws away other fitted parameters when 
p>l. 

2.4. An F-test for testing no effect. Results in Theorem 3 naturally lead 
us to consider an F-test for testing no effect. The next theorem proposes an 
F-test that inherits properties of the classical F-tests. 

Theorem 4. Under the conditions in Theorem 3 and conditioned on 
{Xi, . . . ,Xn}: 

(a) (/ — H*) and {H* — L) are asymptotically orthogonal, in the sense 
that 

E{{I - H*){H* - L)y|Xi, . . . = E{{H* - H*')y\Xi,. . . 

which tends to a zero vector in probability. 

(b) Under the simple homoscedastic assumption, an F-statistic is formed 

as 

F- SSRp/MH*)-!) 
^ ■ ^ SSEp/{n-tr{H*))' 

where tr(if*) is the trace of H* . Conditioned on {Xi, . . . ,Xn}, with the nor- 
mal error assumption, the F-statistic (2.16) is asymptotically F -distributed 
with degrees of freedom {tr(H*) — l,n — tr{H*)). 
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Table 1 

ANOVA table for local polynomial regression 



Source Degree of freedom Sum of squares Mean squares F 

Regression tr{H*)-l SSRp = n-^y^ (H* - L)y MSRp = (Jl!fn-i) im 

Residual n~tr{H*) SSEp = n-^y'^ {I - H*)y MSEp = („"_''tff|?)) 

Total SST = n-^y'^{I - L)y 



(c) The conditional trace of H* for local linear regression is asymptoti- 
cally 

(2.17) tT{H*) = h-^\n\{uo + U2/fi2){l + op(l)), 
where denotes the range of XiS and Vj = J u^K'^{u)du. 

We remark that when a local pth polynomial approximation is exact, 
E{{H* — H* )y\Xi, . . .,Xn} = 0, that is, H* is idempotent and the resulting 
-F-statistic (2.16) has an exact F-distribution as in the classical settings. 
Based on (2.8) and Theorems 3 and 4, an ANOVA table for LPR is given in 
Table 1. It has been shown in Theorem 2(d) that SSEp{h) is a -y/n-consistent 
estimate for cj^ when the error variance is homoscedastic. Table 1 shows that 
MSEp{h) = SSEp{h) is ^'^ unbiased estimate for in finite-sample 

settings, which is similar to the classical MSE in linear models. With the 
ANOVA table, an analogous adjusted R-squared may be defined as 

f2 18l ^2 SSEp{h)/{n-tT{H*)) 

(2.18) i?,,„,,.(M - 1 sST/{n-l) • 



3. Extension to varying coefficient models. In this section, we extend the 
ANOVA decomposition to VCM, illustrating that the ANOVA framework 
can be extended to the multivariate case. Though there is no room in this 
paper for a full discussion of VCM, we develop expressions for local and 
global ANOVA decomposition and the ANOVA R-squared in this section. 

The VCM assumes the following conditional linear structure: 

d 

(3.1) Y = Y,akiU)Xk + a{U)e, 

k=l 

where Xi, . . . , Xa, d>l, are the covariates with Xi = 1, a(C/) = {ai{U), . . . , 
0'd{U))'^ is the functional coefficient vector, U and e are independent, and 
e has a mean and unit variance. Specifically, when d = 1, model (3.1) is 
reduced to the bivariate nonparametric model (1.1). On the other hand, 
if the varying coefficients are constants, that is, ak{U) = a^, k = l,...,d, 
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the model is the multivariate linear model. Based on (3.1), the theoretical 
ANOVA decomposition is 

Var(y) = \^t{E{Y\U,Xi, ...,Xd)) + ^(Var(y|?7, Xi, . . .,Xd)) 

(3.2) 

= J {a{u)^x — fiy)'^ f{'x.\u)g{u) dxdu + J a^{u)g{u) du, 

where g{u) denotes the underlying density function for U, and /(x|n) the 
underlying conditional density function of x = {Xi, . . . jX^)'^ given u. 

Hoover et al. [15] and Fan and Zhang [11] applied LPR to estimate 
the varying-coefficient function vector sl{U). Assume that the {p + l)st- 
order derivative of a(C/) exists, and data {Ui,Xii, . . . , Xid, Yi), i = 1, . . . ,n, 
are drawn from model (3.1). Based on a Taylor expansion, ak{Ui), i = 
1, . . . ,n,k = 1, . . . ,d, is approximated by Pk,oiu) + Pk,iiu){Ui — u) + ■ ■ ■ + 
Pk,piu)(Ui — uY, for Ui in a neighborhood of a grid point u. Then local 
polynomial estimator (3^ = {Pk,o^ ■ ■ ■ ^Pk,p)'^ , k = 1, . . . ,d, for VCM can be 
obtained by the following locally weighted least squares equation: 

n / d p \ 2 

(3.3) minn-^ E " E E f^kAU^ - uy X,k Kh{Ui - u)/g{u; h), 
^ i=l\ k=lj=0 J 

where f3 = (/3i,o, • • • , /3i,p, • • • , /3d,o, • • • , /3d,p)^, and g{u;h) = x 
^27=1 ^h{Ui — u) denotes the kernel density estimate for g{u). For conve- 
nience, (3.3) and its solution are expressed in a matrix form. Let 

/Xn ••• Xu{Ui-u)P ••• Xirf ••• xuui-uy\ 

= ; ■ • . ; : I 

\Xnl ••• Xm{Un-u)P ••• Xnd ••• ^nd(?7„-^x)Vnx(p+l)d 

and Wu be an n X n diagonal matrix of weights with ith element Kh{Ui — 
u)/g{u; h). Then the solution to (3.3) can be expressed as /9(n) = (X^VF^X^)^-"^ x 
X^W^y, and the local polynomial estimator for a(n) is 

a(u) = (/, ® e(p+i),i)(X^iy„X„)-iX^VF„y, 
where ® denotes the Kronecker product and 6(^+1) ^ is a (p+ l)-dimension 
vector with 1 on the kth. position and elsewhere, and a(n) = (/3i^o(^)) • • • > 
Pdflin))^- 

Similarly to the bivariate case, Theorem 5 gives the local finite-sample 
ANOVA decomposition for VCM. 

Theorem 5. Under model (3.1), an exact and finite-sample ANOVA 
decomposition is obtained for local polynomial fitting at a grid point u: 

n-^Y.l=l{y^-y?Kh{Ui-u) 



SST{u;h) = 



g{u;h) 
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g{u;h) 

n-'j:tiiMu)-YfKh{U, 



+ 



g{u;h) 

= SSEp{u; h) + SSRp{u; h), 

where Y^{u) = eni:S.u{X.lWuy^u)-^XlWuy = ELlE%okJiU^-uyXik with 
eni an n-dimension vector with 1 at the ith position and elsewhere. 

The ANOVA decomposition in Theorem 5 extends the bivariate ANOVA 
decomposition (2.5) to VCM in a straightforward way. A global ANOVA 
decomposition can be constructed by integrating the local counterparts in 
Theorem 5: 

(3.4) SST{h) = SSEp{h) + SSRp{h), 
where 

SST = J SST{u;h)g{u;h)du = n-\^{I - L)y, 

(3.5) SSEp{h) = J SSEp{u; h)g{u] h) du = n~^y'^{I - H*)y, 

SSRp{h)= J SSEp{u-h)giu;h)du = n-^y^{H^- L)y, 

where H* = J WuHug{u; h) du is a symmetric n x n matrix with = X„(X^ x 
WuXu)~^X^VFu. The matrix expression in the right-hand side of (3.5) is de- 
rived under Conditions (Bl) below and (A2), and similarly to Section 2, SST 
is free of the bandwidth. Then a global R-squared for VCM is defined as 

n'^rh\ 1 SSEpjh) SSRpjh) 

To investigate the asymptotic properties of the global ANOVA R-squared 

(3.6) , we impose Conditions (A2), (A3), (A5), and the following technical 
conditions: 

Conditions (B). 

(Bl) The second derivative of the density g{u) is bounded, continuous, and 

square integrable on a compact support. 
(B2) The {p + l)st derivative of aj{-), j = 1, . . . , d, exists. 
(B3) EXj" < oo, for some s > 2, j = 1, . . . ,p. 

(B4) Let ^ij{u) = E{XiXj\U = u), i,j = 1, . . . ,d, 7ij(-) is continuous in a 
neighborhood of u. 
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Now, we state the asymptotic normality for the global ANOVA R-squared 
(3.6) in the following theorem and its proof is given in the Appendix. 

Theorem 6. Assume that as oo, h = h{n) 0. When fitting LPR 
with an odd p, under Conditions (A2), (A3), (A5) and (B1)-(B4), with 
and nh^ oo : 

(a) The asymptotic conditional bias of Rp{h) is 

-h^-^J a\u)g"{u)du{l + op{l)). 

(b) The asymptotic conditional variance of Rp{h) is 
_i/Var(£2) f r \ , {m,-al){E{a^{U))f ^ 

X (l + Op(l)). 

(c) Under the homoscedastic assumption and conditioned on {Xi, . . . , 
Rp{h) converges in distribution to a normal distribution with the above 
asymptotic conditional bias and variance. 

(d) Under the assumptions in (c), SSEp[h) for VCM is a ^/n- consistent 
estimate for a"^ . Its asymptotic conditional bias is op{n"^/^) if J g"{u)du = 
and the asymptotic conditional variance n^^a^{J Kq{v) dv){l +op(l)). 

Theorem 6 extends Theorem 2 to VCM. Other ANOVA results for VCM, 
such as degree of freedom, testing against Hq :afc(C/) = c for some k with c 
a constant, and testing for overall model significance, will be derived in a 
separate paper. 

4. Numerical results. In this section, we use computer simulations to 
investigate the performance of the ANOVA R-squared and the proposed 
F-test. 

4.1. Simulation results for the ANOVA R-squared. Two examples from 
Doksum and Froda [6] are used to compare the performance between the 
ANOVA R-squared (2.10), the adjusted ANOVA R-squared (2.18), the cor- 
relation R-squared (1.3), and an empirical RSS-related R-squared Rg = 
RSS / J2i0^i — y)'^- For comparison only, we also include the R-squared from 
fitting a simple linear model. Sample sizes of re = 50 and 200 are used with 
400 simulations. Following Doksum and Froda [6], we use a fixed bandwidth 
/i = 0.22 (approximately 0.7 times the standard deviation of X in the exam- 
ples) . The purpose is to see how the four coefficients of determination differ 
from one another when the same amount of smoothing is applied. Local 
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linear regression with the Epanechnikov kernel K{u) = 0.75(1 — n^)/|„|<i is 
applied and 200 equally spaced grid points on (minj Xj, max, X,) are used 
to approximate the integration for Ri{h) and Rfadji^)- special treat- 
ment for boundary points is implemented for any of the four nonparametric 
R-squared's. 

Example 1. Bump model: Y = 2- 5{X - e~'^°^^^~^-^^^ ) +ae, where X 
follows a Uniform(0, 1) and the distribution of e is A^(0,1). X and e are 
independent, and a = 0.5,1,2,4 results in high to low values for the true 
value of the coefficient of determination. The results show that the four 
nonparametric R-squared's have similar performance for both n = 50 and 
n = 200, and hence the plots are omitted for brevity. The values for the 
ANOVA R-squared is slightly smaller than and R^; for example, when 
a = 0.5, the average Rj is 0.8155 (sd 0.0325), 0.8273 (sd 0.0323) for R^, and 
0.8337 (sd 0.0337) for R^ 

Example 2. Twisted pear model: Y = b + OAXe^^'^-^^^ + ^i±|^cje, 
where X ~ A^(1.2, (1/3)^) and e ~ A^(0, 1). X and e are independent, and the 
values of a are the same as in Example 1 . The original model from Doksum 
and Froda [6] did not include the constant 5. We add a nonzero constant in 
the model for convenience of performing F-tests in Section 4.2. This model 
represents a situation where the relationship between X and Y is strong for 
small X, but then tapers off as the noise variance increases. Figure 1 gives 
the boxplots for n = 50. Clearly both the unadjusted and adjusted ANOVA 
R-squared's behave much more stably than R^ and R^. When a = 0.5, the 
values of mean (sd) are 0.9512 (0.0195), 0.9444 (0.0216), 0.8587 (0.1662) and 
0.8730 (0.1752) for Rj, adjusted -Rf,adj' -^p and R^^, respectively. Both R^ and 
R^ have a skewed distribution for this heteroscedastic model. Similar results 
can be observed for the case of cj = 1. When cj = 4, we note that there is one 
negative Rg and four negative R^ , which are not guaranteed to lie between 
and 1. The results for n = 200 are similar to those of n = 50 and hence 
are omitted. This example demonstrates some advantages of the ANOVA 
R-squared in a heteroscedastic model as compared to other nonparametric 
coefficients of determination. 



4.2. Simulation results for the F-test of no effect. Due to boundary ef- 
fects in practice, we adopt a more conservative version of the F-statistic, 
defined as 

(41) ^(;,) _ {SSR,{h)/{tT{H*) - 1) 



{E^iY^ - YY - SSR,{h))/{n - ti{H*)) ' 
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Fig. 1. Example 2. Boxplots for 400 trials of five different R-squared's with n — 50; (1) 
ANOVA Ri, (2) adjusted ANOVA Rf, (3) R-squared from fitting a simple linear model, 
(4) _Rp (Doksum and Samarov [5]) and (5) empirical R^. 



where SSRp{h) is estimated based on (2.8) without any boundary adjust- 
ment. Note that in the denominator of (4.1), Q2i(Xi — i^)^ — SSRp{h)) is used 
instead of SSEp{h). Examples 1 and 2 with cr = 1 are modified as Examples 3 
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and 4 to illustrate the proposed F-test. For each example, three fixed values 
of the bandwidth are used: Example 3, h =0.15, 0.22 and 0.34, and Example 
4, =0.22, 0.34 and 0.51, with a ratio of roughly 1.5. The F-test statistic 
in (4.1) is calculated and its p-value is obtained using the F-distribution 
with degrees of freedom {tr{H*) — l,n — ti(H*)). A significance level 0.05 is 
used to determine whether to reject the null hypothesis or not. Again sam- 
ple sizes n = 50 and n = 200 are used with 400 simulations. For comparison 
only, we also include another F-flavor test, the pseudo-likelihood ratio test 
(PLRT) for no effect by Azzalini and Bowman [2], in which a chi-squared 
distribution was calibrated to obtain the p-value. 

Example 3. Consider the model: Y = 2-ax{X- e-ioo(^-0-5)^) + e, 
where a = 0,0.5,...,3, X~ Uniform(0, 1) and e ~ A^(0, 1) . The case of a = 
gives the intercept only model, while Example 1 corresponds to a = 5. Figure 
2(a) illustrates the shapes of the true regression functions. The proportions 
of rejection by the F-statistic (4.1) and PLRT are plotted in Figure 2(b)-(d) 
as a function of a. With a conservative (4.1), all type-I errors of the F-test 
are below 5% level. The PLRT has slightly better power than the F-test 
when n = 50, and the two tests behave similarly for n = 200. Both tests 
have better power with bandwidth increasing, while the type-I error of the 
PLRT exceeds 0.05 level when /i = 0.34. 

Example 4. Consider the following model: Y = 5 + aXe^^~^-^^^ + 
(i+o^5X) ^^ ^^^^^ a = 0,0.01,..., 0.06, X ~ iV(1.2(l/3)2), and £^N{0,l'^). 
For this heteroscedastic model, a = corresponds to the null hypothesis, 
and Example 2 corresponds to a = 0.1. We note that neither of the two tests 
is formally applicable, but we want to examine their robustness against de- 
viations from homoscedasticity. A plot of the true regression functions is 
given in Figure 2(e), and the percentages of rejection over 400 simulations 
are given in Figure 2(f)-(h). As in Example 3, the PLRT has slightly better 
power than the F-test when n = 50. We observe a less accurate approxi- 
mation of the type-I error by the PLRT when n = 200: 7.75%, 6.5% and 
6.25% for /i = 0.22, 0.34 and 0.51, respectively (the corresponding numbers 
are 4.5%, 4% and 4% for the F-test). This may justify PLRT's better per- 
formance when a = 0.01 and 0.02. This example shows that even under a 
heteroscedastic error structure, both tests perform reasonably well. 

5. Real data. The data from Simonoff [25] were obtained in Lake Erie, 
containing 52 rows numbered consecutively from the northwest (row 1) to 
the southeast (row 52) and the sum of yields of the harvest in 1989, 1990 and 
1991, as measured by the total number of lugs (a lug is roughly 30 pounds 
of grapes). Figure 3(a) shows the data and the local linear estimates at grid 
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Fig. 2. Examples 3 and 4- (a) P/of o/ t/ie irue regression function curves for Example 
3, a = 0,0.5, ... ,3; (b) the percentages of rejection for simulated data in Example 3 based 
on 400 simulations with h — 0.15; ANOVA F-test (solid line), PLRT (long dash line), 
and the short dash line indicates the 5% significance level; + fn = 200^,' o (n = 5Q); (c) 
same as in (b) except h = 0.22; (d) same as in (b) except h — 0.34; (e) plot of the true 
regression function curves for Example 4, a = 0, 0.01, . . . , 0.06; (f)-(h); same as in (b)-(d) 
for Example 4 with /i = 0.22, 0.34 o.'f-d 0.51, respectively. 




Fig. 3. (a) Scatterplot of total lug counts versus row for the vineyard data with local 
linear estimates h = 3 (solid line) and h = 1.5 (dashed line), (b) Plot of the corresponding 
pointwise Rf{x). 



points 1, 1.5, . . . , 52, with the Gaussian kernel and bandwidth h = 3 (sohd 
line) and h = 1.5 (dashed line). The choice of bandwidth follows Simonoff 
[25]. The dip in yield around rows 30-40 is possibly due to a farmhouse di- 
rectly opposite those rows (Simonoff [25]). The coefficients of determination 
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Table 2 

ANOVA table for vineyard data with bandwidth 3 



Source 


Degree of freedom 


Sum of squares 


Mean squares 


F 


Regression 

Residual 

Total 


(7.5509 - 1) 
(52 - 7.5509) 

51 


CCD 2204.6682 
OOllp to 

CCJ7 391.2489 
^^/ji _ 2595.9171 


2204.6682 

3§l^i2§9 
44.4491 
3180.4808 
52 


F = 15.3299 


indicate ^ 


5ood explanatory power: when h = 3, 


Rl, Rl and fi^ 


are 0.8493, 



0.9414, 0.8854, respectively; when h = 1.5, 0.9046, 0.9638 and 0.9297. The 
corresponding pointwise R-squared is shown in Figure 3(b). The curve with 
h = 1.5 has a larger pointwise Rfix) in most locations than that of /i = 3. 
The local R-squared with /i = 3 is only 40-50% for rows 31-34, and above 
90% for rows 12-23 and 46-52, reflecting some difference across data in the 
proportion of variation explained by local linear regression. The difference 
leads to the idea of using the local R-squared for variable bandwidth selec- 
tion in a future paper. The ANOVA tables for h = 3 and 1.5 are given in 
Tables 2 and 3. As expected, the SSRi oi h = 1.5 is greater than that of 
h = 3. Both p-values of the ANOVA F-statistic (4.1) are <10~^, indicating 
rejection of the null hypothesis. The PLRT also gives very small p- values, 
4.3 X 10~^ and 1.33 x 10~^ for h = 1.5 and 3, respectively. Note that due to 
boundary effects, SSRp{h) + SSEp[h) does not equal the sample variance of 
Y . We give both quantities in the ANOVA tables to illustrate this effect in 
practice. 

6. Discussion. Though the idea of nonparametric ANOVA inference is 
not new, we believe that the work in this paper provides a unified framework 
with an asymptotic geometric configuration for the first time. The proposed 
ANOVA tools for LPR are easy to carry out in practice and we hope that 
the methodology will be useful for data analysis. It will be interesting to 
explore a similar ANOVA framework for other nonparametric regression 
methods such as penalized splines in future studies. The ground-breaking 
points are the elegant local ANOVA decomposition (2.5) and construction 



Table 3 

ANOVA table for vineyard data with bandwidth 1.5 



Source 


Degree of freedom 


Sum of squares 


Mean squares 


F 


Regression 
Residual 

Total 


(14.8095 - 1) 
(52 - 14.8095) 

51 


CCD 2461.9695 
OOIXp 52 
CC!? 259.5325 
52 

^^/ji _ 2721.5020 


2461.9695 

2\?s.m 

36.1905 
3180.4808 
52 


F = 8.9798 
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of global ANOVA quantities through integrating local counterparts. Thus 
LPR, fitting local polynomials across data, may be viewed as a "calculus" 
extension of classical polynomial models. A surprising by-product is that 
the projected response H*y has a bias of order h^, which is smaller than the 
usual order h?' . The proposed projection matrix H* overcomes the problem 
of a nonsymmetric smoother matrix of local linear regression, and we show 
that it has nice geometric properties that lead to a natural F-test for no- 
effect. H* also provides a new geometric view of LPR: for example, in the 
case of local linear regression, the local fitting at x is to project y into local 
column space of X and the locally projected values are I3q{x)+ (3i {x) {Xi — x) , 
i = l,...,n; these locally projected values at different grid points around 
Xi are then combined through weighted integration to form the projected 
value Y* [see (2.15)]. The projection view and the geometric representation 
of the ANOVA quantities offer new insights for LPR. The proposed F- 
test shares the property of the "Wilks phenomenon" with the generalized 
likelihood ratio test (Fan, Zhang and Zhang [10]), in that it does not depend 
on nuisance parameters. The numerical results presented in the paper show 
that the test statistic under the null hypothesis follows well the asymptotic 
F-distribution without further calibration; one does not have to simulate 
the null distributions to obtain the critical value. The paper also presents 
a brief multivariate extension of nonparametric ANOVA inference to VCM; 
more details will be developed in a separate paper. Based on findings in this 
paper, several follow-up problems are being investigated, including extension 
of the F-test to test for a polynomial relationship (Huang and Su [16]), 
and ANOVA inference for partial linear models and generalized additive 
models. We are also interested in applying the ANOVA approach to study 
the bandwidth selection problem, for example, using the local R-squared for 
variable bandwidth selection, and using the classical model selection criteria 
of AIC and BIG with the proposed SSEp{h) and degree of freedom tr(H*) 
for global bandwidth selection. 



Proofs of Theorems 3, 4 and 6 are included in this section. The following 
lemma by Mack and Silverman [18] will be needed. 

Lemma A.l. Assume that E\Y'^\ < oo and sup ^ J \y\^ f{x,y) dy < oo, 
where f{x,y) denotes the joint density of {X,Y). Let K be a bounded pos- 
itive function with a bounded support, satisfying a Lipschitz condition, and 
D the support for the marginal density of X . Then 
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n 



-1 



Y.{Kh{Xi - x)Y, - E[Kh{X, - x)Yi]} 



Op[{nh/\og{l/h)r^/\ 
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provided that v?"'~^h — > oo for some a < 1 — s""*^. 

Proof of Theorem 3. For the ith element Y* , under Conditions (Al) 
and (A2), 

Y* - m{Xi) = j {^0,{x){X, - xy^ Kh{Xi - x) dx 

- j m{Xi)Kh{Xi - x) dx 

(A.l) = j mix) - Poix)) + ■■■ + 0pix) - Ppix))iXi - xr) 

X Kh{Xi — x) dx 

- j {fip+i{x){Xi - xY+^ + r{x,Xi))Kh{X, - x) dx, 

where r(x,Xj) denotes the remainder terms after a (p + l)st-order Taylor 
expansion. By using the bias expression from Wand and Jones [26], for ex- 
ample, when p is odd, 

e| j (/3o(x) - (3o{x))Kh{Xi - x) dx\Xi, . . . = /if+i5o,p(x)(l + op(l)), 

and similarly for / (/3j(x) — j3j{x)){Xi — xy Kh{Xi — x) dx, j > 1. With 
J /3p+i{x){Xi - xf+^Kh{X, - x) dx 

= -^^h^^^^,,+,m^+^\X,){l + op{l)), 

the asymptotic conditional bias of Y* in (2.12) is obtained when p is odd. 
The case for an even p follows analogously. For local linear regression in part 
(b), the /i^-order terms are canceled, and the asymptotic conditional bias 
follows from further expansion of (A.l). 

For part (c), denote the conditional bias vector of y* by b = H*m — m. 
Then 

(A.2) E{H*\\Xi,...,Xn} -m = H*{m + h)-in = h + H*h. 

The rate of b = H*m — m is given in (2.12) and (2.13). It remains to inves- 
tigate the rate of elements in H* = {h*j). The (j, A:)th element of (X^WX) 

matrix is Sj,k{x) = E^(^i " xy+''~^KhiXi - x)/f{x) = nh^+>'~\fi,+k-2 + 
Hj+k-if'ix)/f{x)){l + op{l)) by Lemma A.l, and 

(A.3) X^WX = nD{Sp + hS'pf'{x)/f{x) + op{h^))D, 
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where D = diag(l, /i, . . . , h^), Sp = {m+j^2)i<i,j<{p+i) and 5^ = 
(/u»+i-i)i<ij<(p+i). Then (X^WX)-i = Z?-iS-il?-i(l + op(l))- Denote 
= (sij) and Sij is of order 0(1). Then before integrating over x, the ith 

diagonal element of WHf{x) = WXiX'^WX.y^X.^W f{x) has a form: 
E E(i^ft(X, - x)fiX, - + op(l)), 

i=0 fc=0 

which is of order 0(n~^/i~^)(l + op(l)). After integration, the rate for h*^ 
remains 0{n~^h~^){l + op(l)). We next show that the rate of nondiagonal 
elements of H* is of order 0(n~^)(l + op(l)). For i ^j, the integrand for 
h*j is 

p p 

1=0 k=0 

which is of order 0(n~^)(l + op(l)). Then results stated in (c) follow from (A. 2). 

For part (d), under the homoscedastic model, Var(y*) = H*'^a'^, and 
the conditional variance of Y* is cr'^J2jh*j- When i= j, h*- is of order 
0{n~^h~^){l + op {!)). For i^j, 

X f(Xj}dXj{l+op{l)} 

Hence the asymptotic conditional variance of Y^* is as given in (2.14). This 
completes the proof of Theorem 3. □ 

Proof of Theorem 4. Theorem 4(a) follows directly from Theorem 
3. For part (b), since H* is asymptotically idempotent, (H* — L) is asymp- 
totically idempotent given that {H* - Lf = H* - L. Therefore with the 
homoscedastic normality assumption under the no-effect null hypothesis, 
SSRp{h) has an asymptotic ^^-distribution with degree of freedom (tr(f/'*) — 
1). Similarly for SSEp{h), it has an asymptotic x^-distribution with a de- 
gree of freedom (n — tr{H*)). With (H* — L) and (/ — H*) being asymptotic 
orthogonal in part (a), the test statistic F in (2.16) has an asymptotic F- 
distribution with a degree of freedom (tr(i7*) — l,n — tr{H*)). 

For part (c), note that iv{H^W f{x)) =iv{f{x)Wy.(K^WX)-^y.^W) = 
iT{{X'^WX)-^X^W^Xf{x)). Using Sp in (A.3) with p= 1, and 
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(2.17) is obtained. Therefore the proof of Theorem 4 is complete. □ 

Proof of Theorem 6. We need the following notation: the error vec- 
tor as e = {a{Ui)ei, . . . , a{Un)£n)^ , the mean vector as m = (a({7i)-^Xi, . . . , 
a(C/„)^X„)^ with X, = (l,Xi2,...,Xi,d)^, and = /p+i ® diag(l, . . . , /i^). 
Let jJL = (/ip+i, . . . , /i2p+i)"^ [recall = J K{u) du]. It follows from (3.6) 
that 

(A.4) 

= -s^{-iSSEAh) - E{a\U))) + {SST - a^)^^^}- 
The first term in (A.4) can be expressed as 

SSEp{h) - E{a\U)) = - I y^{Wu - WuHu)yg{u)du - E{a\U)) 

= h + h + h, 

where h = ^e^iWu - WuHu)eg{u)du - E{a\U)), h = lS^^{Wu - 
WuHu)rag{u) du and h = ^j e^(W„ - WuHu)uig{u) du. 

For matrix H^, using Lemma A.l we find that X^VF„X„ = D{T(^Sp)D{l + 

op(i)) and Y.i=A^'\y^)y^lWu{Xik{Ui-uY+\...,Xr,k{Un-uy+^Y = 

D{T ® ^)a(P+i)^/iP+i(l + op(l)), where T = E{{Xi,.. .,Xdf{Xi, . . . ,Xrf)| 
U = u} and a^^^^^ = {af~^^\ . . . , a^^^^^)"^. The term I2 conditioned on {Xi, . . . , 
Xn} is nonrandom and asymptotically I2 = h?'^'^'^g{u) {^2p+2 ~ I^^S^ x 

^2p+2)(l + op(l)). Conditioned on {Xi, . . . , Xn} , I3 has a mean and its 
variance is of order /i^Cp+i). For h, assuming local homoscedasticity, 



E{a\U)) + Opin~^h'^). 



1 " r /■ 

The asymptotic conditional mean and variance for Ii are 

E{h\Xi,...,Xn) = h^Y J duil + op{l)), 

(A.6) Vavih\Xu...,Xn) = n'^J2[l ^\u)Kh{U^ - u) du 



n-'E{a\U))( / K*o{v)dv]{l + op{l)). 



It is clear that under the condition nh?'^'^'^ ^ 0, /i is the dominating term for 
{SSEpih) — E{a'^{U))). Further, the asymptotic conditional variance of Ii is 
dominated by (A.6) since Op{n^'^h'^'^) in (A.5) is smaher than (A.6) under 
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the condition that n/i^ — > oo. Using Theorem 8.16 in Lehmann and Casella 
[17], ^/n{SST — ay) has the asymptotic normahty A^(0, Var[(y — /i^)^]). Then 
from (A. 4), the asymptotic conditional variance of {Rp{h) — rf ) is obtained. 

Last we estabhsh the asymptotic normahty of Rp{h) in the homoscedastic 
case. Since SST with probabihty 1, (A. 4) becomes 

(Rlih) - ,?) = {-^h{l + op{l)) + {SST{h) - al)^\. 

L J/ y J 

/i is a summation of i.i.d. random variables ef, i = 1, . . . ,n, and hence by 
the central limit theorem, Ii has an asymptotic normal distribution. It is 
easy to show that the covariance of Ii and SST conditioned on Xi, . . . ,Xn 
is of smaller order than the sum of variances of Ii and {SST — ay). Thus, 
the asymptotic normality for Rp{h) is obtained. The results in part (d) are 
easily seen from asymptotic normality of /i. □ 
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